8 research outputs found

    The state of SQL-on-Hadoop in the cloud

    Get PDF
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer ReviewedPostprint (author's final draft

    Growing and Serving Large Open-domain Knowledge Graphs

    Full text link
    Applications of large open-domain knowledge graphs (KGs) to real-world problems pose many unique challenges. In this paper, we present extensions to Saga our platform for continuous construction and serving of knowledge at scale. In particular, we describe a pipeline for training knowledge graph embeddings that powers key capabilities such as fact ranking, fact verification, a related entities service, and support for entity linking. We then describe how our platform, including graph embeddings, can be leveraged to create a Semantic Annotation service that links unstructured Web documents to entities in our KG. Semantic annotation of the Web effectively expands our knowledge graph with edges to open-domain Web content which can be used in various search and ranking problems. Finally, we leverage annotated Web documents to drive Open-domain Knowledge Extraction. This targeted extraction framework identifies important coverage issues in the KG, then finds relevant data sources for target entities on the Web and extracts missing information to enrich the KG. Finally, we describe adaptations to our knowledge platform needed to construct and serve private personal knowledge on-device. This includes private incremental KG construction, cross-device knowledge sync, and global knowledge enrichment.Comment: To be published in SIGMOD 202

    The state of SQL-on-Hadoop in the cloud

    No full text
    Managed Hadoop in the cloud, especially SQL-on-Hadoop, has been gaining attention recently. On Platform-as-a-Service (PaaS), analytical services like Hive and Spark come preconfigured for general-purpose and ready to use. Thus, giving companies a quick entry and on-demand deployment of ready SQL-like solutions for their big data needs. This study evaluates cloud services from an end-user perspective, comparing providers including: Microsoft Azure, Amazon Web Services, Google Cloud, and Rackspace. The study focuses on performance, readiness, scalability, and cost-effectiveness of the different solutions at entry/test level clusters sizes. Results are based on over 15,000 Hive queries derived from the industry standard TPC-H benchmark. The study is framed within the ALOJA research project, which features an open source benchmarking and analysis platform that has been recently extended to support SQL-on-Hadoop engines. The ALOJA Project aims to lower the total cost of ownership (TCO) of big data deployments and study their performance characteristics for optimization. The study benchmarks cloud providers across a diverse range instance types, and uses input data scales from 1GB to 1TB, in order to survey the popular entry-level PaaS SQL-on-Hadoop solutions, thereby establishing a common results-base upon which subsequent research can be carried out by the project. Initial results already show the main performance trends to both hardware and software configuration, pricing, similarities and architectural differences of the evaluated PaaS solutions. Whereas some providers focus on decoupling storage and computing resources while offering network-based elastic storage, others choose to keep the local processing model from Hadoop for high performance, but reducing flexibility. Results also show the importance of application-level tuning and how keeping up-to-date hardware and software stacks can influence performance even more than replicating the on-premises model in the cloud.This work is partially supported by the Microsoft Azure for Research program, the European Research Council (ERC) under the EUs Horizon 2020 programme (GA 639595), the Spanish Ministry of Education (TIN2015-65316-P), and the Generalitat de Catalunya (2014-SGR-1051).Peer Reviewe

    Prevention of pulmonary embolism and deep vein thrombosis with low dose aspirin: Pulmonary Embolism Prevention (PEP) trial

    No full text

    Evaluation of prognostic risk models for postoperative pulmonary complications in adult patients undergoing major abdominal surgery: a systematic review and international external validation cohort study

    Get PDF
    Background Stratifying risk of postoperative pulmonary complications after major abdominal surgery allows clinicians to modify risk through targeted interventions and enhanced monitoring. In this study, we aimed to identify and validate prognostic models against a new consensus definition of postoperative pulmonary complications. Methods We did a systematic review and international external validation cohort study. The systematic review was done in accordance with the Preferred Reporting Items for Systematic Reviews and Meta-Analyses guidelines. We searched MEDLINE and Embase on March 1, 2020, for articles published in English that reported on risk prediction models for postoperative pulmonary complications following abdominal surgery. External validation of existing models was done within a prospective international cohort study of adult patients (≥18 years) undergoing major abdominal surgery. Data were collected between Jan 1, 2019, and April 30, 2019, in the UK, Ireland, and Australia. Discriminative ability and prognostic accuracy summary statistics were compared between models for the 30-day postoperative pulmonary complication rate as defined by the Standardised Endpoints in Perioperative Medicine Core Outcome Measures in Perioperative and Anaesthetic Care (StEP-COMPAC). Model performance was compared using the area under the receiver operating characteristic curve (AUROCC). Findings In total, we identified 2903 records from our literature search; of which, 2514 (86·6%) unique records were screened, 121 (4·8%) of 2514 full texts were assessed for eligibility, and 29 unique prognostic models were identified. Nine (31·0%) of 29 models had score development reported only, 19 (65·5%) had undergone internal validation, and only four (13·8%) had been externally validated. Data to validate six eligible models were collected in the international external validation cohort study. Data from 11 591 patients were available, with an overall postoperative pulmonary complication rate of 7·8% (n=903). None of the six models showed good discrimination (defined as AUROCC ≥0·70) for identifying postoperative pulmonary complications, with the Assess Respiratory Risk in Surgical Patients in Catalonia score showing the best discrimination (AUROCC 0·700 [95% CI 0·683–0·717]). Interpretation In the pre-COVID-19 pandemic data, variability in the risk of pulmonary complications (StEP-COMPAC definition) following major abdominal surgery was poorly described by existing prognostication tools. To improve surgical safety during the COVID-19 pandemic recovery and beyond, novel risk stratification tools are required. Funding British Journal of Surgery Society

    Global economic burden of unmet surgical need for appendicitis

    No full text
    Background There is a substantial gap in provision of adequate surgical care in many low- and middle-income countries. This study aimed to identify the economic burden of unmet surgical need for the common condition of appendicitis. Methods Data on the incidence of appendicitis from 170 countries and two different approaches were used to estimate numbers of patients who do not receive surgery: as a fixed proportion of the total unmet surgical need per country (approach 1); and based on country income status (approach 2). Indirect costs with current levels of access and local quality, and those if quality were at the standards of high-income countries, were estimated. A human capital approach was applied, focusing on the economic burden resulting from premature death and absenteeism. Results Excess mortality was 4185 per 100 000 cases of appendicitis using approach 1 and 3448 per 100 000 using approach 2. The economic burden of continuing current levels of access and local quality was US 92492millionusingapproach1and92 492 million using approach 1 and 73 141 million using approach 2. The economic burden of not providing surgical care to the standards of high-income countries was 95004millionusingapproach1and95 004 million using approach 1 and 75 666 million using approach 2. The largest share of these costs resulted from premature death (97.7 per cent) and lack of access (97.0 per cent) in contrast to lack of quality. Conclusion For a comparatively non-complex emergency condition such as appendicitis, increasing access to care should be prioritized. Although improving quality of care should not be neglected, increasing provision of care at current standards could reduce societal costs substantially
    corecore